Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments.

نویسندگان

  • Jan Gerretzen
  • Ewa Szymańska
  • Jeroen J Jansen
  • Jacob Bart
  • Henk-Jan van Manen
  • Edwin R van den Heuvel
  • Lutgarde M C Buydens
چکیده

The selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g., baseline correction, smoothing, and alignment, but it is not clear beforehand which method(s) should be used for which data set. The process of preprocessing selection is often limited to trial-and-error and is therefore considered somewhat subjective. In this paper, we present a novel, simple, and effective approach for preprocessing selection. The defining feature of this approach is a design of experiments. On the basis of the design, model performance of a few well-chosen preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions subsequently enables the selection of an optimal preprocessing strategy. The presented approach is applied to eight different spectroscopic data sets, covering both calibration and classification challenges. We show that the approach is able to select a preprocessing strategy which improves model performance by at least 50% compared to the raw data; in most cases, it leads to a strategy very close to the true optimum. Our approach makes preprocessing selection fast, insightful, and objective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیش بینی قیمت سهام با استفاده از شبکه عصبی فازی مبتنی برالگوریتم ژنتیک و مقایسه با شبکه عصبی فازی

In capital markets, stock price forecasting is affected by variety of factors such as political and economic condition and behavior of investors. Determining all effective factors and level of their effectiveness on stock market is very challenging even with technical and knowledge-based analysis by experts. Hence, investors have encountered challenge, doubt and fault in order to invest with mi...

متن کامل

Improving the Performance of ICA Algorithm for fMRI Simulated Data Analysis Using Temporal and Spatial Filters in the Preprocessing Phase

Introduction: The accuracy of analyzing Functional MRI (fMRI) data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI data having high noise is to use suitable preprocessing methods with the aim of data denoising. Some effects of preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evalua...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

Selecting effective features from Phonocardiography by Genetic Algorithm based on Pearson`s Coefficients Correlation

The heart is one of the most important organs in the body, which is responsible for pumping blood into the valvular systems. Beside, heart valve disorders are one of the leading causes of death in the world. These disorders are complications in the heart valves that cause the valves to deform or damage, and as a result, the sounds caused by their opening and closing compared to a healthy heart....

متن کامل

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Analytical chemistry

دوره 87 24  شماره 

صفحات  -

تاریخ انتشار 2015